Identifying data terms can improve cybersecurity efficiency
The term “data” is vague. Knowing the types of data helps companies protect themselves and better recover from a cyberattack.
You may know where your data is, but do you also know what the data consists of? “Imagine you’re at a party. You ask someone you’ve just met what they do for a living, and they answer: ‘I work in data,'” said Sky Cassidy, CEO of MountainTop Data, in an email interview. “They might as well have said ‘stuff.’ It really means nothing on its own.”
SEE: Report: SMB’s unprepared to tackle data privacy (TechRepublic Premium)
Cassidy, familiar with the vagaries of data collection and its usage, is on a quest to make sense of what he calls misunderstood data. Besides reducing confusion, there are cybersecurity-related benefits to knowing more than “it’s data,” especially when cybercriminals come calling.
Cassidy’s point is that knowing what’s inside the container called data is vitally important to those who use the data and those who manage the data when considering data security. To get everyone on the same page when talking about data, Cassidy uses the following categories of data:
Numbers (quantitative data): This is the most common category. Because numbers are easy to analyze, and, if needed, create more pertinent data. An example might be the making of Key Performance Indicators (KPI) from Test Procedure Specification reports (Note: This is not the “Totally Pointless Stuff” report made famous in the movie “Office Space”). To make this category less nebulous, Cassidy suggests refraining from calling it simply data, when you are referring to, for example, sales totals.
SEE: Navigating data privacy (free PDF) (TechRepublic)
Non-numerical data (qualitative data): Cassidy said if it cannot be represented by numbers it’s a good bet the data is qualitative. “The number of website visits or leads would be quantitative, but the URLs people visited, the timestamp, and other information that is more than just a count is qualitative data,” he said.
Once again, saying there is a need for data does not get the job done or make friends. It’s best to be as descriptive as possible; for example, addressing data specific to website visitors is preferred over simply saying “website data.”
Big data: This type of data consists of very large sets of unstructured data. Cassidy referred to buying habits collected by stores as an example, which would likely include:
- What is bought?
- When was it bought?
- What was the cost?
- What type/category of product?
- Who bought it?
“The data collected over time on a single shopper through their use of a rewards card or something similar would not be considered big data, but that same data on every shopper in the US would be big data,” he said. “Other examples would be stock exchange traffic, roadway traffic patterns, weather data, and the information collected by every app on every phone all the time.”
SEE: Data privacy laws: A mini glossary (TechRepublic)
Dark data: This category is information that is created, but seldom, if ever, looked at or used. Cassidy uses the examples of all the billions of photos or emails stored online. “Basically, the data equivalent of everything you put in that storage shed because you might need it, but never do.”
Database: Although this category seems too broad, Cassidy collapses it into data used in direct sales and marketing, also referred to as lists, marketing lists, sales lists, direct marketing data, campaign lists, or target lists. “This is the database of prospects or clients that includes things like, company name, address, phone number, contact name,” Cassidy said. “Using ‘database’ rather than ‘data’ will help prevent confusion, but it is recommended to go even further and say the type of database it actually is, such as sales database, marketing database, or client database.”
SEE: Data Privacy Day: 10 experts give advice for protecting your business (TechRepublic)
Analytics: Cassidy said analytics is not a data category; it’s a process for analyzing raw data in order to make conclusions about that information, which in a sense is moving from the vague term data to specific components that have real value. Instead of website traffic information, analysis can be used to determine what products and services people are most interested in, and what ads are driving the most traffic.
How does this help with cybersecurity?
Cassidy’s goal throughout has been to point out the need to replace broad-sweeping terms, such as “data,” with specific terminology such as “customer-contact database.” Doing so lessens the chance of confusion, and less confusion is always good.
Imagine a company with several remote locations in the middle of a data breach. If each location was on the same page about what specific database was under attack, rather than just saying data was leaving the building, it would narrow down the threat. Knowing the different types of data helps the company pinpoint exactly what kind of data is compromised and what to do about it.